Caption Text Recognition in Video Frames by MAP Matching
نویسندگان
چکیده
In this paper, an approach to detection of caption text in video frames is described. Text recognition in video can be applied to various applications, however there are still problematic issues such as insufficient resolution, complexity of layouts and backgrounds. This study attempts to solve these problems with a segmentation-free approach, called MAP matching method. Besides extending the method to grayscale images, a strategy for character size variation using Gaussian filtering and multi-sized reference patterns is discussed, as well as a method for detecting frames containing caption text. Results show the proposed matching method is able to detect characters of unknown size in caption text. Although over-detection is not negligible, verifying the positions of detected characters can identify the location of keywords with practical precision. It is also shown that the frames containing caption text are detected with nearly 98% accuracy.
منابع مشابه
Visualizing Multimedia Content on Paper Documents: Components of Key Frame Selection for Video Paper
The components of a key frame selection algorithm for a paper-based multimedia browsing interface called Video Paper are described. Analysis of video image frames is combined with the results of processing the closed caption to select key frames that are printed on a paper document together with the closed caption. Bar codes positioned near the key frames allow a user to play the video from the...
متن کاملContent Based Image and Video Retrieval Using Embedded Text
Extraction of text from image and video is an important step in building efficient indexing and retrieval systems for multimedia databases. We adopt a hybrid approach for such text extraction by exploiting a number of characteristics of text blocks in color images and video frames. Our system detects both caption text as well as scene text of different font, size, color and intensity. We have d...
متن کاملPrecise News Video Text Detection/Localization Based on Multiple Frames Integration
This paper presents a multiple frames integration based approach to detect and localize static caption texts on news videos. Utilizing the temporal information of videos, the algorithm includes robust text features and the non-text line deletion technique, and yields precise and tight localization for detected text regions. The Canny edge detector is first applied on reference frames and is fol...
متن کاملDesigning caption production rules based on face, text, and motion detection
Producing off-line captions for the deaf and hearing impaired people is a labor-intensive task that can require up to 18 hours of production per hour of film. Captions are placed manually close to the region of interest but it must avoid masking human faces, texts or any moving objects that might be relevant to the story flow. Our goal is to use image processing techniques to reduce the off-lin...
متن کاملHand Gesture Recognition from RGB-D Data using 2D and 3D Convolutional Neural Networks: a comparative study
Despite considerable enhances in recognizing hand gestures from still images, there are still many challenges in the classification of hand gestures in videos. The latter comes with more challenges, including higher computational complexity and arduous task of representing temporal features. Hand movement dynamics, represented by temporal features, have to be extracted by analyzing the total fr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003